Trainable Tree Distance and an Application to Question Categorisation

نویسنده

  • Martin Emms
چکیده

Continuing a line of work initiated in Boyer et al. (2007), the generalisation of stochastic string distance to a stochastic tree distance, specically to stochastic Tai distance, is considered. An issue in modifying Zhang/Shasha tree-distance for stochastic variants is noted, a Viterbi EM costadaptation algorithm for this distance is proposed and a counter-example noted to an all-paths EM proposal. Experiments are reported in which a kNN categorisation algorithm is applied to a semantically categorised, syntactically annotated corpus. We show that a 67.7% base-line using standard unitcosts can be improved to 72.5% by cost adaptation. 1 Theory and Algorithms The classification of syntactic structures into semantic categories arises in a number of settings. A possible approach to such a classifier is to compute a category for a test item based on its distances to a set of k nearest neighbours in a precategorised example set. This paper takes such an approach and deploying variants of a tree-distance measure, a measure which has been used with some success in a variety of semantically-oriented tasks such as Question-Answering, Entailment Recognition and Semantic Role Labelling (Punyakanok et al., 2004; Kouylekov and Magnini, 2005; Emms, 2006a; Emms, 2006b; Franco-Penya, 2010). An issue which will be considered is how to adapt the atomic costs underlying the tree-distance measure. Tai (1979) first proposed a tree-distance measure. Where S and T are ordered, labelled trees, a Tai mapping is a partial, 1-to-1 mapping σ from the nodes of S to the nodes of T , which respects leftto-right order and ancestry1, such as a

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic utterance type detection using suprasegmental features

The goal of the work presented here is to automatically predict the type of an utterance in spoken dialogue by using automatically extracted suprasegmental information. For this task, we present and compare three stochastic algorithms: hidden Markov models, artificial neural nets, and classification and regression trees. These models are easily trainable, reasonably robust and fit into the prob...

متن کامل

Adapting Tree Distance to Answer Retrieval and Parser Evaluation

The results of experiments on the application of tree-distance to an answer-retrieval task are reported. Various parameters in the definitions of tree-distance are considered, including wholevs-sub tree, node weighting, wild cards and lexical emphasis. The results show that improving parse-quality maps to improved performance on this tree-distance answer-retrieval task. It also shown that one o...

متن کامل

Variants Of Tree Similarity In A Question Answering Task

The results of experiments on the application of a variety of distance measures to a question-answering task are reported. Variants of tree-distance are considered, including whole-vs-sub tree, node weighting, wild cards and lexical emphasis. We derive string-distance as a special case of tree-distance and show that a particular parameterisation of tree-distance outperforms the string-distance ...

متن کامل

Clustering by Tree Distance for Parse Tree Normalisation

The application of tree-distance to clustering is considered. Previous work identified some parameters which favourably affect the use of tree-distance in question-answering tasks. Some evidence is given that the same parameters favourably affect the cluster quality. A potential application is in the creation of systems to carry out transformation of interrogative to indicative sentences, a fir...

متن کامل

Web Categorisation Using Distance-Based Decision Trees

In Web classification, web pages are assigned to pre-defined categories mainly according to their content (content mining). However, the structure of the web site might provide extra information about their category (structure mining). Traditionally, both approaches have been applied separately, or are dealt with techniques that do not generate a model, such as Bayesian techniques. Unfortunatel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010